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WHAT IS CLAIMED IS: 

1 . A method for analyzing a potential cause of a change in a sendee, wherein 
service/duality of the service is monitored, usage of the service is measured; and service events 




€ detected, the method comprising: 

determining a service change time window based^t least in part upon a change in 
service quality between a first working state and a second, non- working state, and upon a change 
in service usage amount, the service change time window encompassing at least part of a service 



outage; 



occurred: and 



retrieving data representing a detected event and a time in which the event 



computing a probability/fliat the detected event caused the service change based at 
least in part on a correlation betweep the event time and the service change time window. 

2. The method of claim 1, wherein determining the service change time window 
comprises determining a seryfce failure time window based upon the change in service quality 
and narrowing the service/failure time window to the service change time window based upon 
the service usage amount measured during that service failure time window. 

3. The method of claim 2, wherein the service quality is monitored through 
periodic polling pf the service quality, and comprising determining the service failure time 
window as bounded by a polled point of the first working state and a polled point of the second, 
non-workipg state. 

4. The method of claim 1, wherein computing the probability comprises 
competing the probability using at least in part a time weighting function which decreases 
exponentially with the distance between the event time and the service change time window. 
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5. The method of claim 1, comprising determining whether one or more othe 
events of a type identical to the detected event occurred, and wherein computing the probability 
comprises computing the probability using at least in part a false occurrence weighting function 
which decreases the probability of the detected event as the cause of the service change for 
instances in which the detected event occurred outside the service chang^nme window. 

6. The method of claim 1, comprising storing historical data associating 
occurrences of prior events with prior service changes, and wherein computing the probability 
that the detected event caused the service change comprises^omputing the probability based at 
least in part on the historical data. 

7. The method of claim 6, wherein storing historical data comprises storing data 
epresenting instances in which prior events occurred within prior service change time windows, 
and wherein computing the probability that the detected event caused the service change 
comprises using at least in part a positive efccurrence weighting function which increases the 

Z probability of the detected event as the/cause of the service change based on instances in the 
5 15 historical data in which a prior every of a type identical to the detected event occurred within a 
prior service change time windo\ 

8. The method/of claim 6, wherein storing historical data comprises storing data 
representing instances in wlaich prior events were identified as having caused prior service 
changes, and wherein computing the probability that the detected event caused the service 

20 change comprises usinfg at least in part a historical weighting function which increases the 
probability of the detected event as the cause of the service change based on instances in the 
historical data in/which a prior event of a type identical to the detected event was identified as 
having caused/a prior service change. 
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9. The method of claim 1 , comprising retrieving data representing a plurality of 
detected events and corresponding event times, and wherein computing the probability coip^rises 
computing probabilities for each of the plurality of detected events. 

10. The method of claim 9, wherein computing probabilities corrfprises 
computing the probabilities such that the total of all computed probabilities^is 1 . 

11. The method of claim 1, wherein the service comprises service over a 
communication network and wherein the detected event comprises a network event. 

12. The method of claim f , wherein the service comprises service provided by an 
application program and wherein the detected event coniprises an application program event. 

13. The method of claim 1, whereirnhe service change is a service outage, 
comprising determining the service change thr^window as a change in service quality from the 
first working state to the second, non-working state. 4 ■ 

14. The method of claim 1, wherein the service change is a service recovery, 
comprising determining the service^change time window as a change in service quality from the 
second, non- working state to the first, working state. 

15. The method of claim 1, wherein determining the service change time window 
comprises detecting a change in service quality by detecting a step change in measured usage. 

16. ^ymethod for analyzing potential causes of a service change, the method 

comprising: 

/determining a service change time window encompassing a change of service 
between a first working state and a service outage, the service change being determined at least 
in part based on measured service usage levels; 
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detecting occurrences of a set of events within a given time prior to and during tile 
service change time window, each occurrence of an event being associated with a time at^which 
the event occurred; and X 

computing a probability distribution for the set of events, which probability 
5 distribution determines for each event in the set the probability that tWdetected event caused the 
service change, the probability distribution being based at least impart on relations between the 
time of each event occurrence and the service change time window. 

1 7. The method of claim 16, wherein computing the probability distribution for 
the set of events comprises computing the probability distribution using a first weighting 

^ 10 function which is the product of two or more second weighting functions. 

1 8. The method of claim 16, wherein the two or more second functions are 
selected from the group consisting of: / 

? a time weighting function which decreases exponentially the probability of a 

?l i given event as the cause of thej^ervice change with the distance between the given event time and 

Q 1 5 the service change time window; 

^ / 
M / 

a false occurrence weighting function which decreases the probability of a given 
event as the cause of Ine service change for instances in which events of the same type as the 
given event occurred outside the service change time window; 

A positive occurrence weighting function which increases the probability of a 
20 given event4s the cause of the service change based on instances stored in a historical database 
in whichr events of the same type as the given event occurred within a prior service change time 
window; and 
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a historical weighting function which increases the probability of a given evpnt as 
the cause of the service change based on instances in the historical database in which events of 
the same type as the given event were identified as having caused a prior service^outage. 

19. The method of claim 18, wherein the step of computing^uie probability 
distribution comprises using a first weighting function which is the product of the time weighting 
function, false occurrence weighting function, positive occurrence weighting function, and user 
weighting function. / 

20. The method of claim 16, comprisine'monitoring service quality, and wherein 
^Idetermining the service change time window comprises determining a service failure time 

^window based upon a change in monitored sendee quality and narrowing the service failure time 
window to the service change time windovr based upon the service usage amount measured 
during that service failure time windo^v. 

21. The method of/claim 20, wherein the service quality is monitored through 
periodic polling of the servicemiality, and comprising determining the service failure time 
window as bounded by a pplled point of the first working state and a polled point of the second, 
non- working state. / 

22. The method of claim 16, comprising computing the probability distribution 
such that the totaf of all probabilities in the distribution is 1 . 

/23. The method of claim 16, wherein the service comprises service over a 
communication network and wherein the detected events comprise network events. 

/ 24. The method of claim 16, wherein the service comprises service provided by 
an application program and wherein the detected events comprise application program events. 
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25. The method of claim 16, wherein the service change is a service outage, 
comprising determining the service change time window as a change in service from the^ixst 
working state to the second, non- working state. 

26. The method of claim 16,. wherein the service change is a^rvice recovery, 
5 comprising determining the service change time window as a change ip^ervice from the second, 

non-working state to the first, working state. 

27. The method of claim 1, wherein determining the service change time window 
comprises detecting a step change in measured usage. 

28. A network monitoring system comprising: 
10 a service monitor for monitoring^uality of service on the network; 

a usage meter for measuring Wsage of the network; 

an event detector for detecting network events and times at which the network 
: events occur; and 

j*u a probable cause 9^gine, coupled to receive data from the service monitor, usage 

p 15 meter, and the event detectoryfor: 

setting a service change time window based upon data received from the 
service monitor or usage meter, the service change time window encompassing at least part of an 
occurrence of a service outage in the network; and 

determining which of the network events detected by the event detector is 
20 the most likely cause of a service change based at least in part of the relations of the detected 
network eyent times to the service change time window. 

29. A computer readable medium storing program code for, when executed, 
causing a computer to perform a method for analyzing a potential cause of an change in a 
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service, wherein service quality of the service is monitored, usage amount of the service i^ 
measured, and service events are detected, the method comprising: 

determining a service change time window based at least in part lipon a change in 
service quality between a first working state and a second, non-working state, and upon a change 
in service usage amount, the service change time window encompassing at least part of a service 
outage; 

retrieving data representing a detected event ajafl a time in which the event 

occurred; and 

computing a probability that the detecf6d event caused the service change based at 
least in part on a correlation between the event time and the service change time window. 

30. A method for quantify ing^tne effect of an outage in a service over a first 
period of time, the method comprising: 

measuring usage of the^ervice over time; 
defining a cost of outage time window comprising the first time period and a 
1 5 second time period following the first time period; and 

computing a cost of outage as the difference between the measured service usage 
during the cost of outage time window with service usage measured during a comparison 
window, the comparison window being substantially equal in time to that of the cost of outage 
time window and reflecting a similar period of service activity as that of the cost of outage time 
20 window without having a service outage. 

► 1 . The method of claim 30, comprising determining the second period of time 
to be a time ih which the measured service usage returns to within a given percentage of a normal 
service usage. 

Express Mail No. EL595664236US 47 
BRMFS1 205409v3 



3882/3 

32. The method of claim 30, comprising determining the second period of time to 
be the shorter of (1) a time in which the measured service usage returns to within a given X 
percentage of a normal service usage and (2) a maximum time period. / 

33. The method of claim 30, wherein computing the cost of outage comprises 
computing the difference in units of service usage. / 

34. The method of claim 33, wherein the service is axommunication service 
conveying a plurality of messages, the method comprising computing the cost of outage in 
numbers of messages conveyed. / 

35. The method of claim 33, wherein rife service is a network server providing 
data items in response to requests therefor, the method comprising computing the cost of outage 

Ain numbers of requests received or data items^rovided by a server on the network. 

36. The method of claim 33, comprising converting the computed units of cost of 
service outage to a monetary value. / 

37. The method of claim 36, wherein converting the computed units of cost of 
service outage comprises multiplying the units of cost of service outage by a first monetary value 
per unit of usage. / 

38. The method of claim 30, comprising comparing the cost of outage to a second 
cost of outage value for a different service and prioritizing the outages based on the compared 
costs. / 

/ 39. The method of claim 30, comprising computing the difference between the 
monitored service usage following the cost of outage time window and a normal service usage 
level/to thereby measure a long term effect of the service outage. 
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40. A method for quantifying the effect of an outage in a service, the method 
comprising: / 

measuring usage amounts of the service during a period of the service outage and 
a second period following the service outage; / 

comparing the measured usage amounts to normal usage amounts measured under 
similar service conditions for a similar period of time where xssf service outage occurs; and 
determining a level of loss of service due to the service outage based on the 
comparison. / 

41. The method of claim 40, comprising defining the second period as the shorter 
of a time period in which measured service usage amounts return to within a given range of 
normal usage amounts and a predefined maximum time period. 

42. The method of claim 40, wherein measuring service usage amounts comprises 
measuring service usage amounts/in terms of units of service usage. 

43. The method of claim 42, wherein the service is a communication service 
conveying a plurality of messages, comprising measuring service usage amounts in terms of 
number of messages coriveyed by the system. 

44. T7ne method of claim 42, wherein the service is a network server providing 
data items in response to requests therefor, comprising measuring service usage amounts in terms 
of numbers of requests received or data items provided by a server on the network. 

/ 45. The method of claim 40, wherein determining the level of service loss 
comprise/ determining that substantially no loss of service occurred due to the outage based on 
the measured service usage amounts and normal service usage amounts being substantially equal. 
/ 46. The method of claim 40, comprising: 
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measuring service usage amounts during a third period following the second^ 

period; 

comparing the measured third period service usage amounts to ndrmal usage 
amounts measured under similar service conditions for a similar period ofiime; and 

determining a long term effect on the service due to^Ke service outage based on 
the comparison. 

47. A computer readable medium storing ffrogram code which, when executed, 
causes a computer to perform a method for quantifying the effect of an outage in a service over a 
first period of time, the method comprising: 
( y measuring service usage overtime; 

defining a cost of outage/time window comprising the first time period and a 
second time period; and 

computing a cost/Sf outage as the difference between the measured level of 
service usage during the cosLof outage time window with a level of usage in a comparison 
window, the comparison window being substantially equal in time to the cost of outage time 
window and reflecting/a similar period of service activity as the cost of outage time window 
without having a service outage. 

4p. A method for predicting a cost of an outage of a service, the method 

comprising: 

measuring time duration for and service usage during the outage; 
comparing the measured usage amounts to normal usage amounts measured under 
similar service conditions for a similar period of time where no service outage occurs, to thereby 
dejfermine a usage loss amount; and 
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computing a predicted cost of the outage based at least upon a cost componeni 



the cost component comprising a function of the measured time of the outage and meawed 
usage loss amount. / 

49. The method of claim 48, comprising measuring service u^age on an ongoing 
5 basis and detecting the onset of the service outage using the measured .service usage. 

50. The method of claim 49, wherein detecting the'onset of the service outage 
comprises detecting a step change in service usage. / 

5 1 . The method of claim 48, comprisingymonitoring quality of the service and 
detecting the onset of a service outage based upon the service quality. 

10 l 52. The method of claim 51, wherein monitoring service quality comprises 



tonitoring service quality through periodic/polling of the service quality, and wherein detecting 



the onset of a service outage comprises detecting the outage onset as bounded by a polled point 
of a first, working state and a polled/point of a second, non-working state. 

53. The method off claim 48, wherein computing the predicted cost of the outage 
15 comprises using a service demand cost component representing an affect on service usage based 

upon the duration of an outage. 

54. The method of claim 53, wherein using the service demand cost component 
comprises multiplying the measured usage loss by a usage loss curve which is a function of time 
duration of an outage and represents a predicted percentage usage due to an outage based on time 

20 duration of the outage. 

/ 55. The method of claim 54, comprising generating the usage loss curve using 
historical data derived from prior service outages. 
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56. The method of claim 48, wherein computing the predicted cost of the outagj 
comprises using a customer retention cost component representing a number or percentag^of 
customers lost due to the outage. 

57. The method of claim 48, wherein computing the predicted^ost of the outage 
5 comprises using an agreement penalty component representing penalties^^rising under one or 

more service agreements due to a service outage. 

58. The method of claim 48, wherein computir^the predicted cost of the outage 
comprises computing the cost in units of service usage. 

59. The method of claim 58, comprising converting the computed units of 
10 predicted cost to a monetary value by multiplying the units of predicted cost by a first monetary 

value per unit of usage. 

^ \ * 60. The method of claim/f 8, comprising comparing the predicted cost of service 

f outage to a second predicted cost of cartage value for a different service and prioritizing the 

?li outages based on the compared costs. 

p 15 61. A networ^monitoring system comprising: 

a usage meter for measuring usage of a service on the network; 
an evei^r detector for detecting network events and times at which the network 

events occur; 

ai>robable cause engine, coupled to receive data from the usage meter and the 
20 event detector for determining which of the network events detected by the event detector is the 
most likelwcause of a service outage based at least in part of the relations of the detected network 
event times to a service change time window, the service change time window encompassing at 
least part of an occurrence of the service outage in the network; and 
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